Skip to content

Conversation

@DJMcNab
Copy link
Member

@DJMcNab DJMcNab commented Oct 14, 2025

The core contributions of this PR are:

  • A trait which a (should be zero-sized) struct can implement, which indicates that it is a type-level proof that a set of target features are enabled.
  • The trampoline macro, which validates a #[target_feature(enable = "xxx")] string against values of one or more of these, ensuring at compile time that a call to a #[target_feature] function will be safe; and then calling it.
  • A corresponding struct for each target feature on x86[-64], which are code generated.

The state of this feature is:

  • It is not used for implementing the Fearless SIMD crate.
  • The x86-64-v{1,2,3,4} level implementations do not exist/are extremely incomplete.
  • Some docs are missing (these are however not the most critical docs, it's only docs on the groupings of x86 features).
  • It does not have support for aarch64 in the architecture levels. This is not hard, it's just data wrangling.

There is also an open licensing question, around the docs taken from the Rust reference. My preference would be to copy https://github.com/rust-lang/reference/blob/1d930e1d5a27e114b4d22a50b0b6cd3771b92e31/LICENSE-MIT#L1 into our LICENSE-MIT, which avoids having to make a decision about copyright-ability here.

My proposed next steps are:

  • Discuss this at Renderer Office Hours tomorrow: Done
  • If we decide this is a direction we want to follow, clean up and land this PR.
  • Follow-up with:
    • aarch64 support
    • Automatic selection/an enum of x86-64 levels
    • Using it in the implementation of Fearless SIMD itself

For review:

  • You can mostly ignore the contents of fearless_simd_core/x86/xxx/xxx.rs, as these are entirely automatically generated. The exception is fearless_simd_core/x86/xxx/mod.rs, which are hand-written, but don't have any logic.

Discussed on Zulip: #simd > Removing `safe-wrappers`

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file (and trampoline.rs) contains the main code needed to understand this PR.

/// See the module level docs [self].
///
/// We require static lifetimes as this is primarily internal to the macro.
pub const fn is_feature_subset<const N: usize>(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function needs the most careful review, because its correctness is being relied upon for safety.

@DJMcNab
Copy link
Member Author

DJMcNab commented Oct 15, 2025

The "glamour shot" of this PR is that given:

#[target_feature(enable = "sse")]
fn sse_mul_f32s(a: [f32; 4], b: [f32; 4]) -> [f32; 4] {
let a: __m128 = bytemuck::must_cast(a);
let b: __m128 = bytemuck::must_cast(b);
bytemuck::must_cast(_mm_mul_ps(a, b))
}

You can run:

let Some(sse) = x86::v1::Sse::try_new() else {
panic!("Example code")
};
let a = [10_f32, 20_f32, 30_f32, 40_f32];
let b = [4_f32, 5_f32, 6_f32, 7_f32];
// Both of these example expansions, the former using the shorthand form:
let res =
trampoline!(Sse = sse => "sse", sse_mul_f32s(a: [f32; 4], b: [f32; 4]) -> [f32; 4]);
assert_eq!(res, [40_f32, 100_f32, 180_f32, 280_f32]);

To entirely safely and soundly use Rust's SIMD intrinsics.


To help guide review, the core contribution of this PR is a way to talk about target features in the type system. This is implemented through this trait:

/// Token that a set of target feature is available.
///
/// Note that this trait is only meaningful when there are values of this type.
/// That is, to enable the target features in `FEATURES`, you *must* have a value
/// of this type.
///
/// Values which implement this trait are used in the second argument to [`trampoline!`],
/// which is a safe abstraction over enabling target features.
///
/// # Safety
///
/// To construct a value of a type implementing this trait, you must have proven that each
/// target feature in `FEATURES` is available.
pub unsafe trait TargetFeatureToken: Copy {
/// The set of target features which the current CPU has, if
/// you have a value of this type.
const FEATURES: &[&str];
/// Enable the target features in `FEATURES` for a single run of `f`, and run it.
///
/// `f` must be marked `#[inline(always)]` for this to work.
///
/// Note that this does *not* enable the target features on the Rust side (e.g. for calling).
/// To do so, you should instead use [`trampoline!`] directly - this is a convenience wrapper around `trampoline`
/// for cases where the dispatch of simd values is handled elsewhere.
fn vectorize<R>(self, f: impl FnOnce() -> R) -> R;
}

Implementing TargetFeatureToken indicates that a token represents one or more target feature being enabled. This token can be used in the new trampoline! macro, to safely use one or more tokens to run code in a #[target_feature(enable = "..."))] context. This works by validating the user-provided target feature string, which makes sure that the provided tokens justify executing that function. An example of these being used is:

let a = [10_f32, 20_f32, 30_f32, 40_f32];
let b = [4_f32, 5_f32, 6_f32, 7_f32];
// Both of these example expansions, the former using the shorthand form:
let res =
trampoline!(Sse = sse => "sse", sse_mul_f32s(a: [f32; 4], b: [f32; 4]) -> [f32; 4]);
assert_eq!(res, [40_f32, 100_f32, 180_f32, 280_f32]);

In this example, the SSE x86 functionality for multiplying is proven to be safe, and then executed and ran.
The contents of fearless_simd_core/lib.rs are the core contribution of this PR, plus the infra code in trampoline.rs which makes it work.

Separately, in this PR, we have the functionality for properly using this on the x86_64 (and also plain x86) architectures. This is the contents of the x86 folder. This involves:

  • A token struct for each target feature which Rust supports, with the trivially correct safety checks for constructing them.
  • A struct for each of x86-64-v{1,2,3,4}, which are the micro-architecture levels of x86. These levels are in v1/level.rs, etc.

Every file in that folder (except for mod.rs files) is automatically generated by the binary crate of the fearless_simd_core/gen package (after rustfmt is ran). As such, there isn't really any significant logic in those files.

@DJMcNab DJMcNab marked this pull request as ready for review October 16, 2025 14:27
@ajakubowicz-canva ajakubowicz-canva self-requested a review October 19, 2025 23:42
@taj-p taj-p self-requested a review October 22, 2025 20:07
Copy link

@taj-p taj-p left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Soundness makes sense to me! Let me know if these TODOs are deliberate, once you've solved the question of whether to split out x86 into x86 and x86_64, and I'm happy to approve

Comment on lines 26 to 28
//! These examples use [bytemuck](https://crates.io/crates/bytemuck) for this.
//!
//! <!-- TODO -->
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this TODO deliberately left for completing in a later PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that was my approximate intention. I'll plan to follow up with that very soon after we land this, but I don't want to potentially block this entire PR on getting those reviewed now.

//!
//! # Crate Feature Flags
//!
//! <!-- TODO -->
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a deliberately left TODO?

Copy link
Member Author

@DJMcNab DJMcNab Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's correct. This is something we'd update closer to release time - in particular, the std feature currently does nothing other than existing for forward-compatability, so there aren't actually any meaningful feature flags.

Comment on lines 179 to 180
/// Note that a function only operating on 128 bytes is probably too small for checking
/// whether a token exists just for it is worthwhile.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: grammar

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just removed this block, as it's not really all that helpful.

Comment on lines +167 to +176
/// /// Perform some computation using SIMD.
/// #[target_feature(enable = "f1,f2")]
/// fn uses_simd(val: [f32; 4]) -> [f32; 4] {
/// // ...
/// }
///
/// let a = [1., 2., 3., 4.];
/// let Some(token) = token else { return scalar_fallback(a) };
///
/// trampoline!(Token = token => "f1,f2", uses_simd(a: [f32; 4]) -> [f32; 4])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a minor misunderstanding regarding why the token feature set needs to be declared "f1,f2" here, whilst the feature set is also declared on the uses_simd function?

When would the feature string passed to trampoline and the function target feature strings diverge? On this line what happens if in the trampoline call "f1,f2" is accidentally written as "f1"?

Similarly, I need clarification of the the utility of multiple tokens. E.g. [Token = token, Sse = my_sse] => "f1,f2,sse"?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, I need clarification of the the utility of multiple tokens. E.g. [Token = token, Sse = my_sse] => "f1,f2,sse"?

Attempt at answering my own question.
The list of tokens provided to trampoline provides an explicit list of permitted/witnessed features. The function passed to trampoline (uses_simd in this example), has required features declared in the target_feature attribute. Thus, trampoline enforces that it is only safe to call this function if the provided tokens contain the subset of required features?

Very very cool.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, exactly. This is due to our dependency of the target features 1.1 Rust feature, which moves that safety check into the Rust compiler.

And yes, you can use multiple tokens exactly as you describe. I'll see about adding some docs for that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To put this another way, declaring the target feature string in the macro body allows us to move the target features from just being an attribute, to being both an attribute but also a const value, which we can then perform validation on.

//! This abstraction is designed to be combined with target features 1.1, the recent update
//! in the Rust compiler to allow calling `#[target_feature]` functions safely from within
//! other `#[target_feature]` functions.
//! As such, once you have used the [`trampoline!`] macro, you can call any intrinsic in [`core::arch`].
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I am starting to understand the power of this abstraction. Per Stabilize target_feature_11 #134090, it is unsafe to call a function with target_feature declared unless the caller is in a context with those features. Thus the initial call into the target_feature context is unsafe. This trampoline! provides a safe alternative.

Copy link
Collaborator

@ajakubowicz-canva ajakubowicz-canva Oct 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the "glamour shot" described in comment #108 (comment) makes perfect sense to me now! I really like this PR. Apologies that it's taking me a while to get through. On Monday I plan to go through the code that's outside fearless_simd_core.

Rename `trampoline.rs` to `support.rs`
The old name conflicted with the name of the macro, leading to it being
harder to find the docs of the macro itself.

Remove unneeded reference

Remove entire note on 128 bytes being too small

The point it was making was:
- Fairly hard to explain
- Not necessarily true

Add a few more test cases

Co-authored-by: Taj Pereira <[email protected]>
@DJMcNab
Copy link
Member Author

DJMcNab commented Oct 24, 2025

Thanks both for the excellent reviews - even thought they clearly raced, all the comments were very helpful for improving things!

@AndrewJakubowicz
Copy link

AndrewJakubowicz commented Oct 25, 2025

I had another thought after reviewing your other PR. I'm wondering if this is something that trampoline can express.

Trampoline is excellent cause you can safely call into SIMD target features from non target features callees. Is trampoline also useful to constrain target features? Imagine that you can use 512bit vectors but there's some particular function that triggers the CPU to downclock. Could you use trampoline to narrow the target features when calling from a callee with more target features into a SIMD function intentionally restricting the features available and avoiding a potential CPU downclock? I don't know how useful this would be in practice but I'm curious.

Great work by the way!

Edit: on further thought I realize this is a gap in my target features understanding.

@DJMcNab
Copy link
Member Author

DJMcNab commented Oct 27, 2025

Is trampoline also useful to constrain target features? Imagine that you can use 512bit vectors but there's some particular function that triggers the CPU to downclock. Could you use trampoline to narrow the target features when calling from a callee with more target features into a SIMD function intentionally restricting the features available and avoiding a potential CPU downclock? I don't know how useful this would be in practice but I'm curious.

There's a few things to say here, to hopefully help guide your understanding:

  1. As I understand it, this wouldn't need trampoline at all; you should be able to safely call the narrowed function; this is again due to target features 1.1.
  2. In the current design of fearless_simd, the target features of the caller doesn't impact the codegen (except insofar as the autovectoriser is concerned); instead, it depends on which implementation of Simd you use.
  3. The target_feature attribute does impact codegen if you're using std::simd, but that's nightly only. Incidentally, we could make a Simd implementation on top of std::simd.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants